crash report
The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving
Patrikar, Jay, Sharma, Apoorva, Veer, Sushant, Li, Boyi, Scherer, Sebastian, Pavone, Marco
Learning-based autonomous driving systems are trained mostly on incident-free data, offering little guidance near safety-performance boundaries. Real crash reports contain precisely the contrastive evidence needed, but they are hard to use: narratives are unstructured, third-person, and poorly grounded to sensor views. We address these challenges by normalizing crash narratives to ego-centric language and converting both logs and crashes into a unified scene-action representation suitable for retrieval. At decision time, our system adjudicates proposed actions by retrieving relevant precedents from this unified index; an agentic counterfactual extension proposes plausible alternatives, retrieves for each, and reasons across outcomes before deciding. On a nuScenes benchmark, precedent retrieval substantially improves calibration, with recall on contextually preferred actions rising from 24% to 53%. The counterfactual variant preserves these gains while sharpening decisions near risk.
Stack Trace-Based Crash Deduplication with Transformer Adaptation
Mamun, Md Afif Al, Uddin, Gias, Xia, Lan, Zhang, Longyu
--Automated crash reporting systems generate large volumes of duplicate reports, overwhelming issue-tracking systems and increasing developer workload. Traditional stack trace-based deduplication methods--relying on string similarity, rule-based heuristics, or deep learning (DL) models--often fail to capture the contextual and structural relationships within stack traces. We propose dedupT, a transformer-based approach that models stack traces holistically rather than as isolated frames. Extensive experiments on real-world datasets show that dedupT outperforms existing DL and traditional methods (e.g., sequence alignment and information retrieval techniques) in both duplicate ranking and unique crash detection, significantly reducing manual triage effort. On four public datasets, dedupT improves Mean Reciprocal Rank (MRR) often by over 15% compared to the best DL baseline and up to 9% over traditional methods while achieving higher Receiver Operating Characteristic Area Under the Curve (ROC-AUC) in detecting unique crash reports. Our work advances the integration of modern natural language processing (NLP) techniques into software engineering, providing an effective solution for stack trace-based crash deduplication. Software issues are generally reported through (1) human-submitted reports and (2) automated crash reports. Human-reported issues typically include textual descriptions detailing the issue, expected and observed behavior, and may include attachments such as images or videos. In contrast, automated crash reports are generated by crash reporting tools (e.g., Sentry However, these automated systems often overwhelm ITS platforms by generating numerous duplicate crash reports for the same issue, requiring developers to manually review and triage them, which is a time-consuming process. For instance, Mozilla Firefox received 2.2 million issues in the first week of 2016, the majority being duplicates [1], while 72% of crash reports in the IntelliJ Platform were found to be duplicates [2]. In such scenarios, grouping similar crashes together is essential, a process known as crash deduplication . Unlike human-written reports with detailed descriptions, automated crash reports primarily contain technical data like stack traces and crash dumps. Figure 1: Example of a Java stack trace. Figure 1: Example of C++ stack trace.
Code Researcher: Deep Research Agent for Large Systems Code and Commit History
Singh, Ramneet, Joel, Sathvik, Mehrotra, Abhav, Wadhwa, Nalin, Bairi, Ramakrishna B, Kanade, Aditya, Natarajan, Nagarajan
Large Language Model (LLM)-based coding agents have shown promising results on coding benchmarks, but their effectiveness on systems code remains underexplored. Due to the size and complexities of systems code, making changes to a systems codebase is a daunting task, even for humans. It requires researching about many pieces of context, derived from the large codebase and its massive commit history, before making changes. Inspired by the recent progress on deep research agents, we design the first deep research agent for code, called Code Researcher, and apply it to the problem of generating patches for mitigating crashes reported in systems code. Code Researcher performs multi-step reasoning about semantics, patterns, and commit history of code to gather sufficient context. The context is stored in a structured memory which is used for synthesizing a patch. We evaluate Code Researcher on kBenchSyz, a benchmark of Linux kernel crashes, and show that it significantly outperforms strong baselines, achieving a crash-resolution rate of 58%, compared to 37.5% by SWE-agent. On an average, Code Researcher explores 10 files in each trajectory whereas SWE-agent explores only 1.33 files, highlighting Code Researcher's ability to deeply explore the codebase. Through another experiment on an open-source multimedia software, we show the generalizability of Code Researcher. Our experiments highlight the importance of global context gathering and multi-faceted reasoning for large codebases.
CrashAgent: Crash Scenario Generation via Multi-modal Reasoning
Li, Miao, Ding, Wenhao, Lin, Haohong, Lyu, Yiqi, Yao, Yihang, Zhang, Yuyou, Zhao, Ding
Training and evaluating autonomous driving algorithms requires a diverse range of scenarios. However, most available datasets predominantly consist of normal driving behaviors demonstrated by human drivers, resulting in a limited number of safety-critical cases. This imbalance, often referred to as a long-tail distribution, restricts the ability of driving algorithms to learn from crucial scenarios involving risk or failure, scenarios that are essential for humans to develop driving skills efficiently. To generate such scenarios, we utilize Multi-modal Large Language Models to convert crash reports of accidents into a structured scenario format, which can be directly executed within simulations. Specifically, we introduce CrashAgent, a multi-agent framework designed to interpret multi-modal real-world traffic crash reports for the generation of both road layouts and the behaviors of the ego vehicle and surrounding traffic participants. We comprehensively evaluate the generated crash scenarios from multiple perspectives, including the accuracy of layout reconstruction, collision rate, and diversity. The resulting high-quality and large-scale crash dataset will be publicly available to support the development of safe driving algorithms in handling safety-critical situations.
CrashFixer: A crash resolution agent for the Linux kernel
Mathai, Alex, Huang, Chenxi, Ma, Suwei, Kim, Jihwan, Mitchell, Hailie, Nogikh, Aleksandr, Maniatis, Petros, Ivanฤiฤ, Franjo, Yang, Junfeng, Ray, Baishakhi
Code large language models (LLMs) have shown impressive capabilities on a multitude of software engineering tasks. In particular, they have demonstrated remarkable utility in the task of code repair. However, common benchmarks used to evaluate the performance of code LLMs are often limited to small-scale settings. In this work, we build upon kGym, which shares a benchmark for system-level Linux kernel bugs and a platform to run experiments on the Linux kernel. This paper introduces CrashFixer, the first LLM-based software repair agent that is applicable to Linux kernel bugs. Inspired by the typical workflow of a kernel developer, we identify the key capabilities an expert developer leverages to resolve a kernel crash. Using this as our guide, we revisit the kGym platform and identify key system improvements needed to practically run LLM-based agents at the scale of the Linux kernel (50K files and 20M lines of code). We implement these changes by extending kGym to create an improved platform - called kGymSuite, which will be open-sourced. Finally, the paper presents an evaluation of various repair strategies for such complex kernel bugs and showcases the value of explicitly generating a hypothesis before attempting to fix bugs in complex systems such as the Linux kernel. We also evaluated CrashFixer's capabilities on still open bugs, and found at least two patch suggestions considered plausible to resolve the reported bug.
AVOID: Autonomous Vehicle Operation Incident Dataset Across the Globe
Zheng, Ou, Abdel-Aty, Mohamed, Wang, Zijin, Ding, Shengxuan, Wang, Dongdong, Huang, Yuxuan
Crash data of autonomous vehicles (AV) or vehicles equipped with advanced driver assistance systems (ADAS) are the key information to understand the crash nature and to enhance the automation systems. However, most of the existing crash data sources are either limited by the sample size or suffer from missing or unverified data. To contribute to the AV safety research community, we introduce AVOID: an open AV crash dataset. Three types of vehicles are considered: Advanced Driving System (ADS) vehicles, Advanced Driver Assistance Systems (ADAS) vehicles, and low-speed autonomous shuttles. The crash data are collected from the National Highway Traffic Safety Administration (NHTSA), California Department of Motor Vehicles (CA DMV) and incident news worldwide, and the data are manually verified and summarized in ready-to-use format. In addition, land use, weather, and geometry information are also provided. The dataset is expected to accelerate the research on AV crash analysis and potential risk identification by providing the research community with data of rich samples, diverse data sources, clear data structure, and high data quality.
11 new deaths tied to semi-autonomous driving systems
Cleveland-born Samuel Alderson (1914-2005) created crash test dummies for the auto industry, drastically improving driver safety. Eleven additional people were killed in U.S. crashes involving vehicles that were using automated driving systems during a four-month period earlier this year, according to newly released government data, part of an alarming pattern of incidents linked to the technology. Ten of the deaths involved vehicles made by Tesla, though it is unclear from the National Highway Traffic Safety Administration's data whether the technology itself was at fault or whether driver error might have been responsible. The 11th death involved a Ford pickup truck. The deaths included four crashes involving motorcycles that occurred during the spring and summer: Two in Florida and one each in California and Utah.
Musk said not one self-driving Tesla had ever crashed. By then, regulators already knew of 8
Elon Musk has long used his mighty Twitter megaphone to amplify the idea that Tesla's automated driving software isn't just safe -- it's safer than anything a human driver can achieve. That campaign kicked into overdrive last fall when the electric-car maker expanded its Full Self-Driving "beta" program from a few thousand people to a fleet that now numbers more than 100,000. The $12,000 feature purportedly lets a Tesla drive itself on highways and neighborhood streets, changing lanes, making turns and obeying traffic signs and signals. As critics scolded Musk for testing experimental technology on public roads without trained safety drivers as backups, Santa Monica investment manager and vocal Tesla booster Ross Gerber was among the allies who sprang to his defense. "There has not been one accident or injury since FSD beta launch," he tweeted in January.